📊 rework regions dataset to have unique indexes #1282

larsyencken · 2023-06-28T11:43:34Z

The regions dataset contains many join tables with one-to-many relationships. In our current strict dimensions model, we have two options:

Give them an integer index, or
Put every column in the index

This change does the latter. It's also meant to trigger discussion about what we want in general here.

The regions dataset contains many join tables with one-to-many relationships. In our current strict dimensions model, we have two options: - Give them an integer index, or - Put every column in the index This change does the latter. It's also meant to trigger discussion about what we want in general here.

pabloarosado · 2023-06-29T07:32:54Z

I think that having a table like aliases, with both code and alias as indexes is not very useful.
More generally, I think so far there has been no benefit of having many separate tables. Should we instead consider keeping only the regions table (which has a well-defined index)?

pabloarosado · 2023-06-29T14:47:27Z

Hey @larsyencken I've removed all tables except for the regions table in the garden step, and adapted the grapher step accordingly.
There are still a few things that need to be refactored:

etl.harmonize
etl.data_helpers.geo
owid_co2 garden step
etl.fasttrack

larsyencken · 2023-06-30T08:51:57Z

Thanks Pablo!

…e individual tables in harmonize tool

…vidual tables in geo function

…e individual tables in owid_co2 dataset

…tables in fasttrack

…co2 dataset

pabloarosado · 2023-07-28T10:00:33Z

Hi @larsyencken I think this PR is ready to be merged. I've removed all tables from the regions dataset except the combined regions table (which has a unique index and all data we need about regions), and I've removed all uses of those tables. I'll wait for @Marigold to check it out before merging (hopefully it doesn't clash with any of the other metadata refactors).

larsyencken · 2023-07-28T12:39:36Z

Good work!!!

Marigold

Thanks, it looks good. However, changing regions triggers a huge ETL rebuild, so please merge it at the end of your workday to avoid disrupting others' work.

github-actions bot assigned larsyencken Jun 28, 2023

larsyencken requested a review from pabloarosado June 28, 2023 11:43

larsyencken mentioned this pull request Jun 28, 2023

🔨 raise strict mode error for non-unique index #1279

Merged

Base automatically changed from strict-mode-unique-index to master June 29, 2023 10:14

larsyencken assigned pabloarosado and unassigned larsyencken Jun 29, 2023

pabloarosado added 3 commits June 29, 2023 15:46

Merge branch 'master' of github.com:owid/etl into regions-unique-index

04d34ba

Remove unnecessary tables from regions dataset

93deb4c

Merge branch 'master' of github.com:owid/etl into regions-unique-index

f5cf4a8

enhance(regions): Simplify code

582c366

pabloarosado added 6 commits July 28, 2023 09:25

Merge branch 'master' of github.com:owid/etl into regions-unique-index

c990aca

refactor(harmonize): Use the new combined regions table instead of th…

b4f38c8

…e individual tables in harmonize tool

refactor(geo): Use the new combined regions table instead of the indi…

5062f4a

…vidual tables in geo function

refactor(emissions): Use the new combined regions table instead of th…

fee68aa

…e individual tables in owid_co2 dataset

refactor(fasttrack): Use the regions table instead of old individual …

0d163ae

…tables in fasttrack

style(regions): Improve format

205ef36

pabloarosado requested review from Marigold and removed request for pabloarosado July 28, 2023 09:53

refactor(emissions): Remove use of old regions tables in unused owid_…

f91b496

…co2 dataset

Marigold approved these changes Aug 2, 2023

View reviewed changes

pabloarosado merged commit 76e6f42 into master Aug 2, 2023

pabloarosado deleted the regions-unique-index branch August 2, 2023 16:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📊 rework regions dataset to have unique indexes #1282

📊 rework regions dataset to have unique indexes #1282

larsyencken commented Jun 28, 2023

pabloarosado commented Jun 29, 2023

pabloarosado commented Jun 29, 2023 •

edited

Loading

larsyencken commented Jun 30, 2023

pabloarosado commented Jul 28, 2023

larsyencken commented Jul 28, 2023

Marigold left a comment

📊 rework regions dataset to have unique indexes #1282

📊 rework regions dataset to have unique indexes #1282

Conversation

larsyencken commented Jun 28, 2023

pabloarosado commented Jun 29, 2023

pabloarosado commented Jun 29, 2023 • edited Loading

larsyencken commented Jun 30, 2023

pabloarosado commented Jul 28, 2023

larsyencken commented Jul 28, 2023

Marigold left a comment

Choose a reason for hiding this comment

pabloarosado commented Jun 29, 2023 •

edited

Loading